Maximizing the Conditional Expected Reward for Reaching the Goal

نویسندگان

  • Christel Baier
  • Joachim Klein
  • Sascha Klüppelholz
  • Sascha Wunderlich
چکیده

The paper addresses the problem of computing maximal conditional expected accumulated rewards until reaching a target state (briefly called maximal conditional expectations) in finite-state Markov decision processes where the condition is given as a reachability constraint. Conditional expectations of this type can, e.g., stand for the maximal expected termination time of probabilistic programs with non-determinism, under the condition that the program eventually terminates, or for the worst-case expected penalty to be paid, assuming that at least three deadlines are missed. The main results of the paper are (i) a polynomial-time algorithm to check the finiteness of maximal conditional expectations, (ii) PSPACE-completeness for the threshold problem in acyclic Markov decision processes where the task is to check whether the maximal conditional expectation exceeds a given threshold, (iii) a pseudo-polynomial-time algorithm for the threshold problem in the general (cyclic) case, and (iv) an exponential-time algorithm for computing the maximal conditional expectation and an optimal scheduler.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of depression symptoms on history-independent reward and punishment processing.

Prior research indicates that depressed individuals are less responsive to rewards and more sensitive to punishments than non-depressed individuals. This study examines decision-making under reward maximizing or punishment minimizing conditions among adults with low (n=47) or high (n=48) depression symptoms. We utilized a history-independent decision-making task where learning is experience-bas...

متن کامل

The Risk and Rewards of Minimizing Shortfall Probability

SUMMER 1999 M any different investment objectives and criteria have been suggested for choosing investment strategies. In a static setting, Markowitz [1952] suggests the meanvariance approach. Economic theory more formally postulates that an individual investor would choose an investment strategy to maximize expected utility of wealth and or consumption. In other settings, other criteria might ...

متن کامل

The Influence of Risk Aversion on Visual Decision Making

The ability to decide between multiple fixation targets in complex visual environments is essential for our survival. Evolution has refined this process to be both rapid and cheap, allowing us to perform over 100,000 saccades a day. Previous models for visual decision making have focused on maximizing reward magnitude or expected value (EV = probability of reward × magnitude of reward). However...

متن کامل

Complexity of Conditional Planning under Partial Observability and Infinite Executions

The computational properties of many classes of conditional and contingent planning are well known. The main division in the field is between probabilistic planning (typically infinite or unbounded executions, reward rather than goal-based, and focus on expected costs or rewards) and non-probabilistic planning (ignoring probabilities, focus on plans that reach goal states.) In this work, we add...

متن کامل

Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes

Multi-Armed Bandit (MAB) problems can be naturally extended to Markov Decision Processes (MDP). We extend the Best Arm Identification problem to episodic fixed-horizon MDPs. Here, the goal of an agent interacting with the MDP is to reach a high confidence on the optimal policy in as few episodes as possible. We propose Posterior Sampling for Pure Exploration (PSPE), a Bayesian algorithm for pur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017